DecisionFlow: Advancing Large Language Model as Principled Decision Maker

A modular framework for interpretable and context-grounded automated decision making with LLMs

Published

May 27, 2025

Authors: X. Chen et al.
Published on Arxiv: 2025-05-27
Link: http://arxiv.org/abs/2505.21397v1
Institutions: University of Illinois Urbana-Champaign
Keywords: DecisionFlow, large language models, decision modeling, symbolic reasoning, explainable AI, medical triage, financial decision making, agriculture planning, utility-based reasoning, Chain-of-Thought, alignment, interpretability

Random Unsplash-style image

Large Language Models (LLMs) are increasingly deployed in high-stakes environments such as healthcare and finance. However, these models often lack transparent, structured reasoning which is essential for reliable and interpretable decisions. Instead of inherently explainable justifications, current LLMs tend to offer post-hoc explanations, highlighting a significant gap between symbolic reasoning techniques and language model capabilities.
To address this gap, the authors propose DecisionFlow—a step-by-step framework enhancing LLM decision-making through structured modeling.
- DecisionFlow guides LLMs to transform natural language scenarios into structured representations, including the identification of actions, attributes, and relevant constraints such as ethical rules or available resources. - The approach infers a latent utility function for context-aware scoring of potential actions and generates rationales directly from the structured reasoning process, thereby boosting transparency and alignment. The framework is modular and uses solely prompt-based methods, making it adaptable to various LLMs.
To validate this approach, extensive experiments were conducted on domains like medical triage (MTA benchmark) and decision tasks in agriculture and stocks (DeLLMa benchmark).
- DecisionFlow significantly outperforms standard baselines such as Zero-shot, Chain-of-Thought prompting, Self-Consistency, and domain-specific methods—recording up to 30% better accuracy. - The results display high consistency across metrics such as alignment, robustness, and human-evaluated interpretability, with up to 90.5% accuracy on medical triage and 76.67% in agriculture (using GPT-4o). Ablation studies highlight the crucial contributions of each modular stage.
Overall, this research illustrates that integrating symbolic, utility-based reasoning with LLMs can deliver more interpretable and reliable decision support for critical applications.
- DecisionFlow equips LLMs with human-like, principled decision-making, marking improvements in accuracy, fairness, and alignment. - The authors suggest avenues for future work including reducing error propagation, exploring joint or end-to-end optimization, and developing advanced fine-tuning or collaborative strategies.